Efficient Phrase Querying with Common Phrase Index

نویسندگان

  • Matthew Chang
  • Chung Keung Poon
چکیده

In this paper, we propose a common phrase index as an efficient index structure to support phrase queries in a very large text database. Our structure is an extension of previous index structures for phrases and achieves better query efficiency with negligible extra storage cost. In our experimental evaluation, a common phrase index has 5% and 20% improvement in query time for the overall and large queries (queries of long phrases) respectively over an auxiliary nextword index. Moreover, it uses only 1% extra storage cost. Compared with an inverted index, our improvement is 40% and 72% for the overall and large queries respectively.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

What's Next? Index Structures for Efficient Phrase Querying

Text retrieval systems are used to fetch documents from large text collections, using queries consisting of words and word sequences. A shortcoming of current systems is that word-sequence queries, also known as phrase queries, can be expensive to evaluate, particularly if they include common words. Another limitation is that some forms of querying are not supported; an example is phrase comple...

متن کامل

Optimised Phrase Querying and Browsing of Large Text Databases

Most search systems for querying large document collections—for example, web search engines—are based on well-understood information retrieval principles. These systems are both efficient and effective in finding answers to many user information needs, expressed through informal ranked or structured Boolean queries. Phrase querying and browsing are additional techniques that can augment or repl...

متن کامل

Efficient Phrase Querying with an Auxiliary Index

Search engines need to evaluate queries extremely fast, a challenging task given the vast quantities of data being indexed. A significant proportion of the queries posed to search engines involve phrases. In this paper we consider how phrase queries can be efficiently supported with low disk overheads. Previous research has shown that phrase queries can be rapidly evaluated using nextword index...

متن کامل

Efficient Phrase Querying with an Auxiliary Index

Search engines need to evaluate queries extremely fast, a challenging task given the vast quantities of data being indexed. A significant proportion of the queries posed to search engines involve phrases. In this paper we consider how phrase queries can be efficiently supported with low disk overheads. Previous research has shown that phrase queries can be rapidly evaluated using nextword index...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006